12 research outputs found

    How Hard is Counting Triangles in the Streaming Model

    Full text link
    The problem of (approximately) counting the number of triangles in a graph is one of the basic problems in graph theory. In this paper we study the problem in the streaming model. We study the amount of memory required by a randomized algorithm to solve this problem. In case the algorithm is allowed one pass over the stream, we present a best possible lower bound of Ω(m)\Omega(m) for graphs GG with mm edges on nn vertices. If a constant number of passes is allowed, we show a lower bound of Ω(m/T)\Omega(m/T), TT the number of triangles. We match, in some sense, this lower bound with a 2-pass O(m/T1/3)O(m/T^{1/3})-memory algorithm that solves the problem of distinguishing graphs with no triangles from graphs with at least TT triangles. We present a new graph parameter ρ(G)\rho(G) -- the triangle density, and conjecture that the space complexity of the triangles problem is Ω(m/ρ(G))\Omega(m/\rho(G)). We match this by a second algorithm that solves the distinguishing problem using O(m/ρ(G))O(m/\rho(G))-memory

    Efficient Triangle Counting in Large Graphs via Degree-based Vertex Partitioning

    Full text link
    The number of triangles is a computationally expensive graph statistic which is frequently used in complex network analysis (e.g., transitivity ratio), in various random graph models (e.g., exponential random graph model) and in important real world applications such as spam detection, uncovering of the hidden thematic structure of the Web and link recommendation. Counting triangles in graphs with millions and billions of edges requires algorithms which run fast, use small amount of space, provide accurate estimates of the number of triangles and preferably are parallelizable. In this paper we present an efficient triangle counting algorithm which can be adapted to the semistreaming model. The key idea of our algorithm is to combine the sampling algorithm of Tsourakakis et al. and the partitioning of the set of vertices into a high degree and a low degree subset respectively as in the Alon, Yuster and Zwick work treating each set appropriately. We obtain a running time O(m+m3/2Δlogntϵ2)O \left(m + \frac{m^{3/2} \Delta \log{n}}{t \epsilon^2} \right) and an ϵ\epsilon approximation (multiplicative error), where nn is the number of vertices, mm the number of edges and Δ\Delta the maximum number of triangles an edge is contained. Furthermore, we show how this algorithm can be adapted to the semistreaming model with space usage O(m1/2logn+m3/2Δlogntϵ2)O\left(m^{1/2}\log{n} + \frac{m^{3/2} \Delta \log{n}}{t \epsilon^2} \right) and a constant number of passes (three) over the graph stream. We apply our methods in various networks with several millions of edges and we obtain excellent results. Finally, we propose a random projection based method for triangle counting and provide a sufficient condition to obtain an estimate with low variance.Comment: 1) 12 pages 2) To appear in the 7th Workshop on Algorithms and Models for the Web Graph (WAW 2010

    Graph Sketching

    No full text

    Approximate Counting of Cycles in Streams

    No full text
    Subgraph counting is a fundamental problem in algorithm design and has many applications in data mining, biology, social networks, and many other domains. Over the past years this problem has been studied extensively from a theoretical point of view. Because of the intensive computational resources required, traditional algorithms are infeasible even for medium sized graphs. A natural way to address this problem in a massive graph is to use the data streaming model, where edges arrive in an arbitrary order and the algorithm is required to use limited memory to approximate the number of subgraphs. Prior to our work, most subgraph counting algorithms are based on edge sampling. In this paper we develop a novel approach for counting cycles of an arbitrary but fixed size in the turnstile model, i. e., the input stream is a sequence of edge insertions and deletions. Our algorithm is based on the idea of computing instances of complex-valued random variables over the given stream, and improves drastically upon the naïve sampling algorithms. In contrast to most existing approaches, our algorithm can also be easily applied in the distributed setting. We believe that the idea of using complex-valued random variables will find further applications, in particular with respect to also counting more general subgraphs

    Counting arbitrary subgraphs in data streams

    No full text
    Abstract. We study the subgraph counting problem in data streams. We provide the first non-trivial estimator for approximately counting the number of occurrences of an arbitrary subgraph H of constant size in a (large) graph G. Our estimator works in the turnstile model, i.e., can handle both edge-insertions and edge-deletions, and is applicable in a distributed setting. Prior to this work, only for a few non-regular graphs estimators were known in case of edge-insertions, leaving the problem of counting general subgraphs in the turnstile model wide open. We further demonstrate the applicability of our estimator by analyzing its concentration for several graphs H and the case where G is a power law graph

    Annotations in Data Streams

    No full text
    The central goal of data stream algorithms is to process massive streams of data using sublinear storage space. Motivated by work in the database community on outsourcing database and data stream processing, we ask whether the space usage of such algorithms be further reduced by enlisting a more powerful “helper ” who can annotate the stream as it is read. We do not wish to blindly trust the helper, so we require that the algorithm be convinced of having computed a correct answer. We show upper bounds that achieve a non-trivial tradeoff between the amount of annotation used and the space required to verify it. We also prove lower bounds on such tradeoffs, often nearly matching the upper bounds, via notions related to Merlin-Arthur communication complexity. Our results cover the classic data stream problems of selection, frequency moments, and fundamental graph problems such as triangle-freeness and connectivity. Our work is also part of a growing trend — including recent studies of multi-pass streaming, read/write streams and randomly ordered streams — of asking more complexity-theoretic questions about data stream processing. It is a recognition that, in addition to practical relevance, the data stream model raises many interesting theoretical questions in its own right.
    corecore